Evaluation of genomic offset predictions in common gardens
Author
Juliette Archambeau
Published
May 30, 2023
In this report, we aim to evaluate the genomic offset predictions from the different GEA methods (LFMM, GF, GDM and RDA) with phenotypic data from five common gardens located in Spain (Asturias, Madrid, Cáceres), Portugal (Fundão) and France (Pierroton). For that, for each GEA method, we predicted the genomic offset of each population when transplanted in the environment of the common garden (so instead of predicting the genomic offset into future climates, we predict it in the new environment of the common garden, i.e. space-for-time approach). The rational is that populations with the highest genomic offset in each common garden are expected to have a lower fitness in the common gardens. In this report, we use tree height and mortality in five common gardens as proxies of fitness.
In comparison, we also estimate the climatic transfer distances (CTD) between the climate-of-origin of the populations and the climate of the common garden (i.e. the absolute value of the difference between the climate of origin of the populations and the climate in the common garden between the tree planting date and the measurement date) and we estimate the association between tree height and mortality and the climatic transfer distances. The climatic transfer distances are calculated for the same set of climatic variables as the one used to calculate the genomic offset (one CTD per climatic variable).
1 Data
1.1 Genomic offset predictions
We load the genomic offset predictions estimated from the different GEAs.
# selected climatic variablesclim_var <-readRDS(here("data/ClimaticData/NamesSelectedVariables.rds"))# climatic data in the common gardens (btw planting and measurement dates)cg_clim <-readRDS(here("data/ClimaticData/CommonGardens/ClimateCG.rds")) %>% dplyr::select(cg,any_of(clim_var)) # Loading point estimate climatic dataadj <-"noADJ"# not adjusted for elevationref_period <-"ref_1901_1950"# reference period 1901-1950clim_past <-readRDS(here(paste0("data/ClimaticData/MaritimePinePops/ClimatePopulationLocationPointEstimates_ReferencePeriods_",adj,".rds")))[[ref_period]]$ref_means %>% dplyr::select(pop,any_of(clim_var))df <-lapply(cg_clim[["cg"]], function(x){for(var in clim_var){ clim_past[[var]] <- ( clim_past[[var]] - cg_clim %>%filter(cg == x) %>%pull(var) ) %>%abs()} return(clim_past)}) %>%setNames(cg_clim[["cg"]]) %>%bind_rows(.id="cg") %>%pivot_longer(cols=any_of(clim_var),names_to="method_input",values_to="varX") %>%mutate(method_type="CTD",method=paste0("CTD_",method_input)) %>%bind_rows(df)
1.3 Phenotypic data
We load the survival and mortality data from the five common gardens.
Code
pheno_data <-readRDS(file=here("data/CommonGardenData/PhenoDataNovember2019_AnnualTraits_UpdatedSept2021_AllSites.rds")) %>% dplyr::rename(pop=prov)no_nas <-sapply(pheno_data, function(x) length(x)-sum(is.na(x)))list_pheno <-list()list_pheno$`Asturias (37 months)`<-table(pheno_data$site,pheno_data$AST_survmar14)["asturias",]list_pheno$`Bordeaux (85 months)`<-table(pheno_data$site,pheno_data$BDX_surv18)["bordeaux",]list_pheno$`Cáceres (8 months)`<-table(pheno_data$site,pheno_data$CAC_survdec11)["caceres",]list_pheno$`Madrid (13 months)`<-table(pheno_data$site,pheno_data$MAD_survdec11)["madrid",]list_pheno$`Fundão (27 months)`<-table(pheno_data$site,pheno_data$POR_survmay13)["portugal",]list_pheno %>%bind_rows(.id="cg") %>%setNames(c("Common garden (tree age)","Nb of dead trees","Nb of trees alive")) %>%mutate("Nb of height measurements"=c(no_nas[["AST_htmar14"]], no_nas[["BDX_htnov18"]], no_nas[["CAC_htdec11"]], no_nas[["MAD_htdec11"]], no_nas[["POR_htmay13"]])) %>% kable_mydf
Common garden (tree age)
Nb of dead trees
Nb of trees alive
Nb of height measurements
Asturias (37 months)
206
3976
3976
Bordeaux (85 months)
119
3225
3217
Cáceres (8 months)
3818
366
340
Madrid (13 months)
3138
1046
1046
Fundão (27 months)
1517
2666
2665
Height and survival measurements in each common garden:
Asturias (Spain) in March 2014 when the trees were 37 month-old (trees were planted in February 2011).
Bordeaux (France) in November 2018 when the trees were 85 month-old (trees were planted in October 2011).
Cáceres (Spain) in December 2011 when the trees were 8 month-old (trees were planted in April 2011). Note that for this common garden, we calculate the bioclimatic variables for the entire year 2011 (instead of calculating the variables only for months between April and December). Indeed, the calculation of the annual bioclimatic variables will be wrong if we do not account for some months, e.g. the mean annual temperature will be higher than expected because we do not account for some winter months.
Madrid (Spain) in December 2011 when the trees were 13 month-old (trees were planted in November 2010).
Fundão (Portugal) in May 2013 when the trees were 27 month-old (trees were planted in February 2011)
2 Mortality models
In this section, we want to determine whether genomic offset (GO) or climate transfer distances (CTD) are associated with the proportion of dead trees in the populations, independently in the five common gardens. We build a model that assumes that the initial sapling height acts as a confounder. Indeed, trees that were higher at the time of planting have a higher probability of survival. This is particularly true in Madrid and Cáceres (Spain) where the seedlings experienced an extreme drought event the same year they were planted, resulting in very high mortality rates. Here is the model:
with \(a_{p}\) the count of individual that died in the population \(p\), \(N_{p}\) the total number of individuals in the population \(p\) (=number of individuals that were initially planted in the common garden), \(p_p\) is the estimated probability of mortality in the population \(p\), \(X_{p}\) is the genomic offset or climatic transfer distance for the population \(p\) and \(H_{p}\) is the BLUPs for height of the population \(p\)(population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. 2022). \(H_{p}\) is used as a proxy of the initial tree height.
We load the population-specific intercepts from the model 1 of Archambeau et al. (2022).
Code
mod_arch2022 <-readRDS(file=here("data/Archambeauetal2022_MOD1.rds"))pop_heights <- mod_arch2022$fit %>% broom.mixed::tidyMCMC(estimate.method ="mean",conf.int = T) %>%# we take the mean of the prov random interceptsfilter(str_detect(term, "^(r_prov\\[)")) %>% dplyr::rename(height=estimate,pop=term) %>%mutate(pop=str_sub(pop,8,-12))pop_heights[1:5,] %>% kable_mydf
pop
height
std.error
conf.low
conf.high
ALT
0.09
0.04
0.01
0.17
ARM
0.12
0.04
0.04
0.20
ARN
-0.09
0.03
-0.16
-0.03
BAY
-0.14
0.03
-0.20
-0.07
BON
-0.04
0.04
-0.12
0.04
2.1.2 Survival data
The response variable is counts of dead trees. To calculate these counts, we load the survival data in the common gardens, in which 0 corresponds to dead trees and 1 to survivors.
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
int nb_tot[N]; // Total number of trees in the population
int nb_dead[N]; // Number of dead trees in the population
vector[N] H; // Mean tree height of the population
vector[N] X; // Genomic offset or climatic transfer distance of the population
}
parameters {
real beta_0;
real beta_H;
real beta_X;
}
model {
nb_dead ~ binomial_logit(nb_tot,beta_0 + beta_H * H + beta_X * X);
beta_0 ~ normal(0,5);//std_normal();
beta_H ~ normal(0,5);//std_normal();
beta_X ~ normal(0,5);//std_normal();
}
generated quantities{
vector[N] log_lik;
// log likelihood for loo
for (n in 1:N) {
log_lik[n] = binomial_logit_lpmf( nb_dead[n] | nb_tot[n] , beta_0 + beta_H * H[n] + beta_X * X[n]);
}
}
We run the models for each common garden and each genomic offset predictions/climatic transfer distances, so a total of 5 * 15 = 75 models runs.
Code
coefftab <-lapply(unique(survival_data$site),function(site_i){lapply(unique(df$method), function(method_i){# Subset the data for the site i and method i sub_data <- survival_data %>%filter(site == site_i) %>%group_by(pop) %>% dplyr::summarise(nb_dead=n()-sum(survival),nb_tot=n()) # transform survival data into mortality data sub_data <- df %>%filter(method == method_i & cg == site_i) %>%inner_join(sub_data, by="pop") %>%inner_join(pop_heights %>% dplyr::select(any_of(c("height", "pop"))), by="pop") %>%arrange(pop)# Data in a list for Stan stanlist <-list(N =nrow(sub_data),nb_dead = sub_data$nb_dead,nb_tot=sub_data$nb_tot,H=(sub_data$height-mean(sub_data$height))/sd(sub_data$height),X=(sub_data$varX -mean(sub_data$varX))/sd(sub_data$varX))# Running the model mod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4, save_warmup =FALSE) #loo(mod) %>% saveRDS(file=here(paste0("outputs/ValidationCommonGarden/MortalityModels/loos/loo_",site_i,"_",method_i,".rds")))# Save coefficients broom.mixed::tidyMCMC(mod,droppars =NULL, estimate.method ="median", ess = F, rhat = F, conf.int = T,conf.level =0.95) %>%filter(str_detect(term, c('beta'))) %>% dplyr::rename(median=estimate,std_error=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(cg=site_i,method=method_i,.before=1) }) %>%bind_rows()}) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationCommonGarden/MortalityModels/coefftab.rds"))
the \(\beta_X\) coefficients, which stand for the association between the counts of dead trees and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations when transplanted in the new environment of the common gardens) or the climatic tranfer distances.
the \(\beta_H\) coefficients, which stand for the association between the counts of dead trees and the BLUPs for height in the five common gardens, used as a proxy of the initial seedling height (a confounder in the model).
As expected, seedlings that were higher at the planting date had a higher probability of survival in Fundão, Cáceres and Madrid, the three common gardens in which mortality rates were considerable. In Bordeaux and Asturias, mortality rates were very low, and there was no association with the initial seedling height, so we may consider that mortality events in these two common gardens were more probably random events than due to climatic or other environmental conditions. So, to evaluate the association between genomic offset predictions/climatic transfer distances and the counts of dead trees, we only interpret the results in Fundão, Cáceres and Madrid.
In Fundão, Cáceres and Madrid, for most genomic offset predictions, populations experiencing higher mortality rates were also those showing higher genomic offset. The most consistent associations across the three common gardens were obtained with:
Gradient Forest (GF) with both candidate and control SNPs.
Redundancy analysis (RDA) with both candidate and control SNPs.
Latent factor mixed model (LFMM) with all SNPs or control SNPs.
Interestingly, climatic transfer distances generally explain mortality probability less well than genomic offset predictions, and none of them show a consistent association with the number of dead trees across the three common gardens. Note that it is almost the case for the climatic transfer distance calculated based on the temperature seasonality, which shows a positive association with the counts of dead trees in Madrid and Fundão, and almost in Cáceres.
2.4 Experimental design
We export in a table with the number and proportion of dead trees for each population in each common garden.
Code
ExpDesignTab <-lapply(unique(survival_data$site),function(site_i){survival_data %>%filter(site==site_i) %>% dplyr::select(pop,survival) %>%drop_na() %>%group_by(pop) %>% dplyr::summarise(nb_dead=n()-sum(survival),nb_tot=n()) %>%mutate(prop_dead=nb_dead*100/nb_tot)}) %>%setNames(unique(survival_data$site)) %>%bind_rows(.id="cg") %>%pivot_wider(names_from=cg,values_from =c(nb_dead, nb_tot,prop_dead),names_sep="_") %>% dplyr::select(pop,contains("asturias"),contains("bordeaux"),contains("caceres"),contains("madrid"),contains("portugal"))# Generate the latex table for the Supplementary Informationprint(xtable(ExpDesignTab, type ="latex",digits=2), file =paste0(here("tables/ExperimentalDesignTablesSurvivalCommonGarden.tex")), include.rownames=FALSE)ExpDesignTab %>%kable_mydf()
pop
nb_dead_asturias
nb_tot_asturias
prop_dead_asturias
nb_dead_bordeaux
nb_tot_bordeaux
prop_dead_bordeaux
nb_dead_caceres
nb_tot_caceres
prop_dead_caceres
nb_dead_madrid
nb_tot_madrid
prop_dead_madrid
nb_dead_portugal
nb_tot_portugal
prop_dead_portugal
ALT
3
72
4.17
1
53
1.89
69
72
95.83
52
72
72.22
19
72
26.39
ARM
1
64
1.56
1
64
1.56
56
64
87.50
49
64
76.56
23
64
35.94
ARN
5
136
3.68
3
109
2.75
129
136
94.85
107
136
78.68
46
136
33.82
BAY
8
144
5.56
3
142
2.11
132
144
91.67
113
144
78.47
56
144
38.89
BON
2
71
2.82
1
48
2.08
62
72
86.11
48
72
66.67
14
72
19.44
CAD
5
80
6.25
3
70
4.29
74
80
92.50
62
80
77.50
22
80
27.50
CAR
0
48
0.00
0
46
0.00
44
48
91.67
36
48
75.00
21
48
43.75
CAS
4
80
5.00
6
78
7.69
78
80
97.50
66
80
82.50
31
80
38.75
CEN
2
72
2.78
1
38
2.63
66
72
91.67
50
72
69.44
18
72
25.00
COC
6
144
4.17
3
109
2.75
137
144
95.14
118
144
81.94
53
143
37.06
COM
0
32
0.00
3
32
9.38
28
32
87.50
21
32
65.62
10
32
31.25
CUE
9
224
4.02
9
194
4.64
211
224
94.20
186
224
83.04
101
224
45.09
HOU
11
208
5.29
6
160
3.75
186
208
89.42
145
208
69.71
66
208
31.73
LAM
3
72
4.17
4
60
6.67
68
72
94.44
57
72
79.17
27
72
37.50
LEI
14
184
7.61
10
143
6.99
162
184
88.04
147
184
79.89
64
184
34.78
MAD
0
8
0.00
0
8
0.00
8
8
100.00
5
8
62.50
2
8
25.00
MIM
7
144
4.86
4
126
3.17
135
144
93.75
116
144
80.56
67
144
46.53
OLB
3
176
1.70
4
116
3.45
149
176
84.66
113
176
64.20
27
176
15.34
OLO
22
191
11.52
3
132
2.27
173
192
90.10
145
192
75.52
73
192
38.02
ORI
4
208
1.92
2
183
1.09
196
208
94.23
162
208
77.88
67
208
32.21
PET
11
192
5.73
3
147
2.04
171
192
89.06
136
192
70.83
75
192
39.06
PIA
8
128
6.25
1
110
0.91
111
128
86.72
91
128
71.09
43
128
33.59
PIE
3
72
4.17
4
55
7.27
67
72
93.06
50
72
69.44
17
72
23.61
PLE
8
160
5.00
9
138
6.52
155
160
96.88
131
160
81.88
71
160
44.38
PUE
1
64
1.56
5
56
8.93
58
64
90.62
50
64
78.12
30
64
46.88
QUA
5
136
3.68
3
117
2.56
123
136
90.44
83
136
61.03
36
136
26.47
SAC
1
72
1.39
2
51
3.92
65
72
90.28
56
72
77.78
35
72
48.61
SAL
7
112
6.25
0
69
0.00
104
112
92.86
85
112
75.89
55
112
49.11
SEG
12
168
7.14
7
168
4.17
154
168
91.67
135
168
80.36
77
168
45.83
SIE
2
64
3.12
1
47
2.13
59
64
92.19
40
64
62.50
29
64
45.31
STJ
15
224
6.70
3
180
1.67
190
224
84.82
154
224
68.75
70
224
31.25
TAM
8
120
6.67
4
71
5.63
114
120
95.00
107
120
89.17
60
120
50.00
VAL
5
96
5.21
5
64
7.81
87
96
90.62
69
96
71.88
29
96
30.21
VER
11
216
5.09
5
160
3.12
197
216
91.20
153
216
70.83
83
216
38.43
Code
# Information used in the manuscriptExpDesignTab[,-1] %>% dplyr::summarise_all(mean) %>% kable_mydf
nb_dead_asturias
nb_tot_asturias
prop_dead_asturias
nb_dead_bordeaux
nb_tot_bordeaux
prop_dead_bordeaux
nb_dead_caceres
nb_tot_caceres
prop_dead_caceres
nb_dead_madrid
nb_tot_madrid
prop_dead_madrid
nb_dead_portugal
nb_tot_portugal
prop_dead_portugal
6.06
123
4.27
3.5
98.35
3.7
112.29
123.06
91.65
92.29
123.06
74.31
44.62
123.03
35.79
3 Height models
3.1 No confounder
3.1.1 Mathematical model
In this section, we want to determine whether genomic offset (GO) predictions or climate transfer distances (CTD) are associated with tree height, independently in the five common gardens. In each common garden, we perform the following model:
with \(Y_ipb\) is the height of the individual \(i\) in the population \(p\) in the block \(b\) and \(X_{p}\) is the value of the variable of interest (GO or CTD) in the population \(p\). We include a quadratic term for \(X_p\) to allow for potential nonlinearity in the response, following Fitzpatrick et al. (2021).
S4 class stanmodel 'anon_model' coded as follows:
data {
int N; // Number of individuals
vector[N] Y; // Response variable (individual tree height)
vector[N] X; // Genomic offset or climatic transfer distances
int<lower=0> nb_bloc; // Number of blocks
int<lower=0, upper=nb_bloc> bloc[N]; // Blocks
}
parameters {
vector[nb_bloc] alpha_bloc; // intercepts of blocks
real beta_X1;
real beta_X2;
real<lower = 0> sigma_r;
}
transformed parameters {
vector[N] mu; // linear predictor
real R_squared; // R^2 to evaluate the goodness of fit of the model
mu = alpha_bloc[bloc] + beta_X1 * X + beta_X2 * square(X);
R_squared = 1 - variance(Y - mu) / variance(Y);
}
model {
Y ~ normal(mu, sigma_r); // Likelihood
sigma_r ~ exponential(1);
alpha_bloc ~ std_normal();
beta_X1 ~ std_normal();
beta_X2 ~ std_normal();
}
generated quantities{
// log likelihood for loo
vector[N] log_lik;
vector[N] muhat;
for (n in 1:N) {
log_lik[n] = normal_lpdf(Y[n] |mu[n],sigma_r);
muhat[n] = normal_rng(mu[n], sigma_r);
}
}
Below, we plot the 95% credible intervals of the \(\beta_{X_1}\) and \(\beta_{X_2}\) coefficients, which stand for the association between tree height and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations when transplanted in the new environment of the common gardens) or the climatic tranfer distances.
Higher genomic offset predictions are consistently associated with lower tree height only in Asturias. Most genomic offset predictions are also associated with lower tree height in Bordeaux.
In Fundão, Cáceres and Madrid, the association between genomic offset predictions and tree height are often in the opposite direction as expected.
Interestingly, the climatic transfer distances based in the mean annual temperature and the annual precipitation are negatively associated with tree height in all common gardens (except Cáceres for the CTD based on annual precipitation). These CTD may therefore be better predictors of tree height in common gardens than the genomic offset predictions.
Finally, I think that using tree height as a proxy for fitness to evaluate genomic offset predictions in common gardens may not be appropriate in maritime pine because this trait has a very low genetic-by-environment interaction (see previous papers). Therefore, the association between genomic offset predictions and tree height in Asturias and Bordeaux may not be due to a higher fitness of the trees that are best adapted to the climate in these common gardens, but more probably to height differences that are common across all environments (i.e. populations from Atlantic climates are generally taller). This is confirmed by the models below in which the population-specific BLUPs for height across all common gardens was included as a confounder. When included, the association between genomic offset predictions and tree height in Asturias almost disappears and either disappears or goes in the opposite direction as expected in Bordeaux. Except the genomic offset predictions based on the RDA remain negatively associated with tree height in Bordeaux and Asturias.
Therefore, I am not sure we can go very far in this validation step. We may retain that genomic offset predictions seem to show the most consistent association with tree height when generated with the RDA, and that this association is robust (though smaller) even when the fixed genetic differences across populations are included as confounder.
with \(H_p\) is a proxy of the initial height of the populations when trees were planted. More precisely, it is the BLUPs for height of the population \(p\)(population varying intercepts calculated across all common gardens in the model 1 of Archambeau et al. 2022).
Archambeau, Juliette, Marta Benito Garzón, Frédéric Barraquand, Marina de Miguel, Christophe Plomion, and Santiago C González-Martı́nez. 2022. “Combining Climatic and Genomic Data Improves Range-Wide Tree Height Growth Prediction in a Forest Tree.”The American Naturalist 200 (4): E141–59.
Fitzpatrick, Matthew C, Vikram E Chhatre, Raju Y Soolanayakanahally, and Stephen R Keller. 2021. “Experimental Support for Genomic Prediction of Climate Maladaptation Using the Machine Learning Approach Gradient Forests.”Molecular Ecology Resources.